Dynamical differential covariance recovers directional network structure in multiscale neural systems

Significance We sense, move, and think by dynamical interactions between neurons. It is now possible to simultaneously record from many individual neurons and brain regions. Methods for analyzing these large-scale recordings are needed that can reveal how the patterns of activity give rise to behavior. We developed dynamical differential covariance (DDC), an efficient, intuitive, and robust way to analyze these recordings and validated it on simulations of model neural networks where the ground truth was known. It can estimate not only the presence of a connection but also which direction the information is flowing in a network between neurons or cortical areas. We applied DDC to recordings from functional magnetic resonance imaging in humans and confirmed predicted connectivity with direct measurements.


Significance
We sense, move, and think by dynamical interactions between neurons. It is now possible to simultaneously record from many individual neurons and brain regions. Methods for analyzing these large-scale recordings are needed that can reveal how the patterns of activity give rise to behavior. We developed dynamical differential covariance (DDC), an efficient, intuitive, and robust way to analyze these recordings and validated it on simulations of model neural networks where the ground truth was known. It can estimate not only the presence of a connection but also which direction the information is flowing in a network between neurons or cortical areas. We applied DDC to recordings from functional magnetic resonance imaging in humans and confirmed predicted connectivity with direct measurements.
information (15) and transfer entropy (16,17), have also been used to infer directed connectivity. This class of methods can in principle improve inference accuracy, but these methods require much more computation or a bigger sample size than covariance methods and do not scale well.
EC models such as dynamic causal modeling (DCM) (8) and Bayes net models (11,(18)(19)(20) search the feasible graph space and fit the entire dataset to every hypothesis. Bayes net methods fit multivariate time recordings through static probability distributions without time dependency while DCM models the observations explicitly through the dynamical process that generates the recorded signals and has been the mainstream method to infer EC. In fact, linear dynamic differential covariance (DDC) shares the same assumption as the linearized DCM (7). The graph searching process requires even more computation, severely limiting the size of the network that can be analyzed.
We previously introduced differential covariance (dCov) (21,22), a directed FC estimation method, and highlighted the performance of two matrices, Δc, which is the correlation between the derivative signal and the signal itself, and Δp, which is the partial covariance between them. In simulated test cases, dCov detected network connections with higher sensitivity than many of the methods reviewed in Smith et al. (6). In this paper, we derive a direct link between dCov and dynamical models of network activity. This leads to a class of estimators called DDC based on an interaction matrix that appears in the equations for a dynamical system. DDC provides a simple and efficient estimate of directed connectivity by combining a model-based approach from EC with computationally efficient methods from FC. DDC is based on an implicit generative model that approximates brain dynamics and entails causality in the context of control theory.
In the following sections, we first analytically show that DDC provides unbiased estimates regardless of the noise structure, without assuming that the data are stationary. These favorable statistical properties were numerically confirmed in networks with both linear and nonlinear dynamics. We then show that DDC can infer the ground-truth connections and their direction in multiscale neural network simulations. The inference accuracy and efficiency were benchmarked against most estimators mentioned above (Table 1). Finally, we apply DDC to resting-state functional magnetic resonance imaging (rs-fMRI) recordings from over 1,000 subjects and show that the extracted connectivity closely matches the structural connectivity measured by diffusion MRI.

Results
A. DDC. Models of neural dynamics span a wide range of scales. At the microscopic level, the voltage trace, calcium dynamics, and firing rate of a single neuron are highly nonlinear. These dynamics Covariance matrix P Partial covariance matrix L1-reg L1-regularized partial covariance matrix L2-reg L2-regularized partial covariance matrix Partial-MI Partial mutual information c-Granger Conditional Granger causality C spk Spike-train cross-correlogram-based connectivity Δc Differential covariance matrix Δp Partial differential covariance matrix ΔL Linear DDC ΔR General nonlinear DDC ΔReLU Nonlinear DDC with ReLU nonlinearity are often modeled using biophysical models based on voltagegated ion channels. In contrast, at the macroscopic level the collective activity of a population of neurons and interactions between brain regions can be approximated by linear dynamics because of ensemble averaging (7,(23)(24)(25). For example, Nozari et al. (25) showed that compared to other sophisticated nonlinear families of models, the simple linear autoregressive model performed best on modeling fMRI and intracranial electroencephalography (iEEG) recordings from hundreds of human subjects. We first propose a linear dynamical model in Eq. 1 for global recordings and a nonlinear dynamical model in Eq. 2 for local neural recordings: where the column vector x is the neural activity, such as the membrane voltage or fMRI signal; W is the square connectivity matrix; and R(x) is a nonlinear response function. Taking the outer product of Eqs. 1 and 2 with x and time averaging , yields where ΔL and ΔR are DDC estimators for W. DDC is the least-squares error estimator (LSE) of W under the assumed system equations (M aterials and M ethods, 1.G.1). Potentially, the extensive statistical literature about LSE (26) can be applied to refine DDC estimation. For example, the pseudoinverse could be used if the covariance matrix is rank deficient. More details are provided in Discussion.
The origin of the linear DDC estimator ΔL from a dynamical model provides an intuition for its effectiveness in estimating W as the product of two matrices: The first term is differential covariance, which carries information about sources and sinks. In a neuron the sink is the inward current from synaptic inputs in the receiving area and in brain imaging it is related to changes in surrogates for local brain activity, whereas the source is the activity level in the sending area. In the second term, an entry in the partial covariance matrix is zero if and only if the residual correlation between x i and x j is zero, which cancels the influence of common sources. Robust estimation of directional interactions becomes possible by combining signals from sources and sinks and canceling signals from common sources. The multiplicative combination of the two terms yields better estimates than either one alone.
A family of estimators arises from the DDC estimator ΔR for nonlinear dynamical systems for different R(x). Estimators can be adapted to the filtering effects from different recording techniques, such as the slow kinetics of calcium signals, by choosing the nonlinear function R(x) appropriately. Here, we use the rectified linear unit (ReLU), parameterized by a threshold, θ, that is often used in artificial neural networks (27), yielding ΔReLU as the corresponding nonlinear DDC estimator. The threshold can be set to optimize performance. Intuitively, the ReLU function rectifies low-magnitude "noise" and retains larger signals.

B. DDC Provides Unbiased Estimation for Nonstationary Data.
DDC estimators have several favorable statistical properties. First, given the correct neural model for the generative process, DDC provides an unbiased estimation of the connectivity matrix. In Material and Methods, 1.G.2, we show analytically that in systems governed by stochastic differential equations, DDC gives unbiased estimates of connectivity W in the sense that the DDC estimation converges to the ground truth given a sufficient number of trials. In addition, the estimation remains unbiased regardless of the noise structure (D in Eq. 13)-whether or not the added noise is correlated. To numerically confirm this result, we simulated a confounder network governed by linear dynamics (Fig. 1A) and estimated the connectivity from simulated time traces (Fig. 1B). It appears that ΔL was able to recover the ground-truth network structure, within a margin of error. To further investigate the origin of error, we orthogonally decomposed the total error (M aterials and M ethods) across trials into bias and variance (Fig. 1D). The bias part is due to the intrinsic property of the estimation method while the variance part drops as the number of estimation trials increases. Across 50 simulations, ΔL achieved the smallest estimation error, and more importantly, its bias over variance ratio (θ b ) was also the lowest (Fig. 1D). Thus, both analytical and numerical results confirmed that DDC estimation is unbiased.
We also quantified the variance and bias over a range of data size and observational noise. DDC consistently had the least estimation error regardless of the size of the dataset (Fig. 1C). In contrast, inference bias for covariance (Cov), partial covariance (P), differential covariance (Δc), and partial differential covariance (Δp) diverged as data volume increased, introducing a systematic error. Regarding noise tolerance (Fig. 1C), the performance of dCov matrices (Δc and Δp) rapidly deteriorated with increasing noise, probably due to the inaccuracies in the computation of derivative. However, DDC remained robust despite these inaccuracies.
Second, we also prove in M aterials and M ethods that DDC can be used to analyze nonstationary data whose higher-order statistics vary with time. In practical neural data processing, stationarity is often assumed when estimating the covariance matrix through sampling over time. However, this may not be a valid assumption because neural recordings can be quite nonstationary due to fluctuating brain states owing to neuromodulation and varying sensory inputs. There is a need for methods that can analyze nonstationary data. In the derivation of DDC, stationarity was not required because relationships imposed by the system equation hold at every time step, regardless of the probability distribution of the process. To verify that DDC does not depend on stationarity, we simulated a two-state dynamical system (M aterials and Methods) whose connectivity (shown in Fig. 1A) remained time invariant while the noise structure was switched between states.
A B  The analytical solution of the covariance structure is given by Eq. 21, as plotted in Fig. 1E. Using nonoverlapping sliding windows, we obtained the sample covariance and ΔL estimation of the connectivity matrix (Fig. 1F ). The time-varying covariance matrix confirmed that the time trace is not stationary and thus sample covariance estimation fails to capture the true connectivity profile. On the other hand, ΔL consistently and accurately estimated the true connectivity matrix, even in window III (Fig. 1F ) where state switching occurred.
C. DDC Inferred the Existence and Direction of Multiple Network Structures and Dynamics. We applied DDC to a number of dynamical systems, including stochastic nonlinear systems and deterministic chaotic systems. The objective was to show the extent to which the connection strengths and directions can be estimated by DDC in a wide class of dynamical systems, especially by the linear model. As a proof of principle, we applied DDC to three-node networks with varying dynamics and network structures (Fig. 2). The chain motif (Fig. 2 A and B) and confounder motif (Fig. 2C) were chosen because they both have a node pair (red dashed line) that is highly correlated but with no physical connection, which is an ideal test of whether DDC can "explain away" spurious correlations. We simulated both linear-and sigmoid-  (E and F) Estimation error, quantified as normalized Euclidean distance between the ground truth and estimation, over 50 trials for both linear and nonlinear models benchmarked with state-of-the-art network inference algorithms. c-Granger, conditional Granger causality; Cov, sample covariance estimation; Δc, differential covariance matrix; Δp, partial differential covariance matrix; L1/L2-reg, L1/L2-regularized partial covariance matrix; P, partial covariance estimation; partial-MI, partial mutual information; std, SD (standard deviation). based nonlinear dynamics (M aterials and M ethods). We intentionally introduced a model mismatch-"sigmoid"-based generative dynamics and "ReLU" nonlinearity for estimation-because in practice, a perfect match is not possible. Both ΔL and ΔReLU correctly inferred the existence and direction of the ground-truth connections ( Fig. 2) while the covariance matrix (Cov) failed to explain away false positive connections and partial covariance (P) was not able to determine the directionality of connections (SI Appendix, Fig. S2B). We further benchmarked DDC performance with more sophisticated methods, including regularized partial covariance (L1-/L2-reg), partial mutual information (MI), and conditional Granger causality (c-Granger) (Fig. 2 E and F ). L1-reg and c-Granger exhibited comparable performance in linear simulations but their performance deteriorated for models with nonlinear dynamics. The heavy computation burden for optimization or model fitting makes these two methods difficult to scale up (see computation time in SI Appendix, Fig. S2E for a 50-node task and in Fig. 3D for a 200-node task). Because regularized partial covariance and c-Granger assume stationarity, they performed poorly on the state-switching case shown in Fig. 1 E and F. DDC was also applied to a larger network consisting of 50 nodes and structured by a combination of confounder and chain motifs (SI Appendix, Fig. S2A). As in the small network case, ΔL and ΔReLU accurately estimated the existence and direction of connections (SI Appendix, Fig. S2B). L1-reg and c-Granger achieved good performance for models with linear dynamics but performed poorly when the system was nonlinear (SI Appendix, Fig. S2D). In this 50-node estimation task, c-Granger computation time was more than three orders of magnitude greater than that of the other methods (SI Appendix, Fig. S2E). Estimation accuracy improved with larger datasets for most of the methods we tested (SI Appendix, Fig. S2C).
Finally, we asked whether DDC can track the information flow in a nonlinear Rössler system, which has a deterministic chaotic attractor. The three equations for this system in Fig. 2D have a nonlinear bidirectional confounder motif. ΔL and ΔReLU correctly identified direct connections and ignored the strong correlations between x 2 and x 3 . This suggests that DDC estimation is robust to model mismatch and can faithfully reflect the direct interactions in the system equation despite the unpredictability of the chaotic system.

D. DDC Identified Ground-Truth Connections with High Sensitiv-
ity across Multiscale Neural Networks. Next, we tested DDC on a network model with 200 leaky integrate-and-fire (LIF) neurons (29). These neurons integrate exponentially filtered synaptic inputs until the membrane potential reaches a threshold, which triggers a spike and a reset to resting membrane potential. The connectivity matrix was a globally connected Erdös-Rényi random graph with uniform connection strengths, to test methods for extracting the existence and direction of network edges (M aterials and M ethods). We used the classification performance of true connections with increasing binarization thresholds (M aterials and M ethods) as a surrogate measure for how well the connectivity of the entire matrix was estimated.
Graphs with a range of sparsity and connection strengths were simulated and DDC was applied to the subthreshold membrane potentials (M aterials and M ethods, representative traces shown in Fig. 3A). Performance was quantified by the area under the curve (AUC) of specificity versus sensitivity ( Fig. 3 B and C). Directed estimation methods (c-Granger, dCov, and DDC) have higher AUC values because the ground-truth matrices were not symmetric. ΔL and ΔReLU were significantly (P <0.001, ranksum test) better than all other methods except c-Granger, which reached a comparable performance level. However, one trial of c-Granger estimation took ∼60 h on a computing cluster with 32 cores, which makes it impractical to analyze the large-scale neural recordings that have become available. Both the linear and nonlinear estimators presented similar curves (Fig. 3B) because the threshold selection process (M aterials and M ethods) of ΔReLU favored a low threshold, approximating a linear model. Possible explanations and improvements of the nonlinear estimator are explored in Discussion. Remarkably, the linear DDC estimator retained its high level of performance in this highly nonlinear simulation and was robust to a broad range of network configurations with different sparsity levels and connection strengths (Fig. 3E). In general, the additional operation of taking partial covariance compared to pairwise covariance did not improve the performance (compare Cov, Δc with P, Δp in Fig. 3D) because in a sparsely connected network structure the influence of indirect connections is weak. Interestingly, Δp had very high sensitivity (true positive rate) even when very few connections were thresholded as positive. This might be due to its sparse estimation (21,22).
We also tested the methods on simulated recordings of macroscopic neural activities based on the reduced Wong-Wang model of the resting state (30). The interaction matrix involves both self-excitation and experimentally measured long-range structural connectivity. We used two datasets of structural connectome in the simulation: the Human Connectome Project (HCP) diffusion MRI (dMRI) connectome (31) and the diffusion tensor imaging (DTI) dataset built in the virtual brain simulator (TVB) (32). The details of the two datasets and the simulation parameters are summarized in Table 2. Most physiological parameters were taken from Deco et al. (30) unless otherwise noted in Table 2. In the dMRI simulation, the relative connection strength and driving current were slightly tuned to resemble the bifurcation plot of the full spiking network (figure 2 in ref. 30). Due to the spike and reset process, the numerical derivative of membrane potential jumps transiently during spike events, thus significantly contributing to DDC estimators. We suspected the DDC estimators performed well because of these spike events.
To test this possibility, we split the time trace and its derivative trace into active time points where at least one neuron spikes and quiet time points where no neuron emits a single spike. We extended the simulation to 40 s to ensure a sufficient data volume and quantified DDC performance for both parts (Fig. 3E, Center and Right). Estimation of ΔL using active time points was as accurate as that using full time points. However, the spike train cross-correlogram was itself not enough to recover the connectivity as indicated by the low AUC value of C spk in Fig. 3C and SI Appendix, Fig. S3C). This points toward the importance of including voltage fluctuations within the active period window, in addition to the binary spike train.
For the HCP dMRI dataset, we simulated neural dynamics based on both population-level connectivity (Fig. 4A) and 1,065 individual connectomes. We thresholded the structural connectivity matrix to a sparsity of 5% because it is difficult to achieve above chance performance for any methods when the connectivity matrix is too dense. Every node in the thresholded graph is still linked to at least five other nodes as shown in SI Appendix, Fig. S4A. The overall performance was quantified by c-sensitivity (M aterials and M ethods), which is a measure of the separation between the estimated value of true positive connections and the estimated value of true negative connections. (c-sensitivity = 1 when the true positives were completely separated from the others by the estimation.) Chance-level performance was calculated by replacing the ground-truth matrix with either realizations of the Erdös-Rényi graph with the same sparsity level (control 1) or those of degreepreserving randomized graphs (control 2).
In the population-level connectivity simulation, ΔL had the best performance followed by (regularized) precision matrices and c-Granger estimation (Fig. 4C), probably because the reduced Wong-Wang model exhibited linear fluctuations around the stable point (30). For c-Granger, only 10% of the data points were used since it took about 46 h to run and full-length estimation would take approximately 1 mo. The ΔL estimated connection strength appeared to be negatively correlated with fiber tract length, in line with most tractography reconstruction observations Population simulation c-sensitivity c-sensitivity Fig. 4. Estimation performance for a macroscopic brain surface model. (A) Populationaveraged structural connectivity measured by dMRI from one hemisphere. The matrix was threshold into a binary matrix where only the strongest 5% of connections (yellow entries) were kept. (B) Correlation between absolute value of ΔL estimate and fiber tract length in the model. Time series were simulated for 1,000 s at 1,000 Hz using a reduced Wong-Wang model supported by populationaveraged dMRI connectivity from one hemisphere. (C) Estimator performance (observed) quantified through c-sensitivity, a measure of the separation between estimated values of true positive connections and true negative connections. Control 1 was calculated based on 1,000 realizations of an Erdös-Rényi graph with same sparsity. Control 2 was calculated based on 1,000 trials of a degree-preserving randomization algorithm (28). Due to computation limit, "c-Granger" estimation was based on 10% of the simulated data. (D) Estimation performance (c-sensitivity) in simulations supported by 1,065 individual dMRI connectivities (one hemisphere).
In another simulation using the built-in structural connectivity in the virtual brain simulator (SI Appendix, Fig. S6), ΔL had the highest performance followed by c-Granger and ΔReLU. The c-Granger method had comparable performance but the computation time was almost four orders of magnitude longer, making it impractical. The raw ΔL and ΔReLU matrices uncovered the strongest connections (red arrows) in the ground-truth matrix (SI Appendix, Fig. S6B). Similarly, only the strongest long-range connections were included, because all methods failed to reach significance for graded anatomical connectivity.

E. DDC Estimation Is Reliable When Applied to rs-fMRI Record-
ings. To critically test DDC on neural data, we applied DDC to rs-fMRI recordings obtained from the HCP. The imaging voxels were parcellated through group independent component analysis (ICA) (M aterials and M ethods), where each independent component (IC) parcellation, shared across subjects, is composed of voxels with similar dynamics. ICs are mainly composed of spatially proximate voxels, forming anatomically recognizable brain regions (SI Appendix, Fig. S9). In addition, we focused on the first 46 ICs encompassing over 40% of cortical voxels (SI Appendix, Fig. S8A) to match cortical measurements using dMRI. Dual regression (M aterials and M ethods) assigned unique ICA-parcellated bloodoxygen-level-dependent (BOLD) signals to each subject, which were treated as nodes for DDC analysis.
We first established the reliability of ΔL estimation, that is, the number of data points required for estimates to become stable and whether the method provided consistent estimates across sessions within one subject. Within one subject, we calculated the correlation of ΔL estimation using all data points with that estimated using different amounts of partial data. The correlation value is 0.6 using data size equivalent to one recording session and gradually increases as more data are included (Fig. 5A). Compared to sample covariance estimation (SI Appendix, Fig. S7), the data volume requirement is relatively higher probably due to the need to calculate the derivative. For most subjects, 2,400 data points, or equivalently, two sessions, were needed to reach a correlation of 0.8. We further plotted the correlation of ΔL estimation using a concatenation of two sessions within and between 10 randomly selected subjects (Fig. 5B). The block structure on the diagonal indicated a higher intraindividual similarity of ΔL across sessions.  There are important connections shared ubiquitously by the majority of subjects (i.e., "backbone connection"). To systematically select for them, the barplot (Right) represents the number of individuals that highlighted a specific connection. The red dashed line represents 90% of total subjects. (D) ΔL backbone connection shared by over 90% of subjects and their IC parcellations registered on an MRI template. Arrows indicate the estimated connection direction and the red arrows indicate IC pairs that are anatomically close. Numbers indicate the IC indexes corresponding to SI Appendix, Table S1 and Fig. S9.
The population statistics from 1,003 subjects (Fig. 5C) confirmed a significantly higher within-subject similarity and a relatively lower between-subject correlation (compare to SI Appendix, Fig. S7C).
In general, ΔL estimation is more variable than sample covariance estimation. To quantify this, we adapted subject identification analysis (33) (Fig. 5D): FC estimation from two sessions together with their subject labels were constructed as database matrices. The identity of a target FC matrix, estimated from the remaining two sessions, was inferred based on the correlation between it and database matrices. To obtain an asymptotic behavior, we adopted a soft criterion for successful identification events. If the true identity belongs to the candidate pool, composed of the top m subjects with highest correlation, then the identification process is successful.
As expected, the identification (ID) accuracy increased as the sample pool size (m) increased (Fig. 5E). Cov and P showed very high ID accuracy as reported in ref. 33. DDC estimators (Δc, ΔL, and ΔReLU) showed an accuracy of around 65% and then reached 90% as pool size increased to 30, which is still acceptable given there are over 1,000 subjects in total. Dynamic FC has been observed previously (34,35), and identifying its origin might reveal fundamental aspects of brain dynamics.

F. DDC Consistently Identified Structurally Connected Brain
Regions. The average and SD of the estimated interaction matrices across subjects are shown in Fig. 6 A and B, respectively, where ΔL and ΔReLU were sparser than the covariance matrix. Two nodes (indicated by red arrows) in ΔL appeared to have a broader range of interactions. They were anatomically registered as "occipital pole" and "medial occipitotemporal gyrus," reflecting the large proportion of visual ICs in the network (SI Appendix, Fig. S9 and Table S1). We binarized the estimated matrices based on significance levels to minimize the influence of nonsignificant spurious connections due to random fluctuations in the signal, thereby increasing noise tolerance. The significance test was determined by an autoregressive bootstrapping procedure. The null hypothesis is that each time series was generated by an independent autoregressive process, which was fitted to the observed data. This ensured that the signature power spectra of fMRI recordings were preserved (M aterials and M ethods). The significant ΔL connections (yellow entries, P < 0.01) across subjects are shown in Fig. 6C. These "backbone connections" shared across a majority of the subjects could be flexibly tuned for network sparsity level. We adopted a strict criterion because we were interested in the most conserved connections shared by over 90% of subjects (red dashed vertical line in Fig. 6C). Their IC parcellations were registered on an MRI template (Fig. 6D). In this case, backbone connections were identified between ICs from the same anatomical region as well as interregional interactions (marked in red).
To quantify the extent to which estimated FCs matched the structural connectivity, we further processed dMRI measurements from the HCP dataset (31) to obtain individual-level IC-based dMRI matrices (SI Appendix, Fig. S8 and M aterials and M ethods). At the IC level, dMRI strengths were bimodal (Fig. 7A), indicating a clear separation between the strong and weak connections. ΔL identified connections with higher dMRI strength compared to those chosen by the covariance matrix (Fig. 7B). Fig. 7C shows the increasing average dMRI strength for decreasing binarization threshold, linking the significance of rs-fMRI to dMRI connectivity for all methods and confirming their biological relevance. DDC uncovered connections with significantly higher dMRI strength values than covariance-based methods (Fig. 7C) and also identified a larger proportion of strong connections (Fig. 7D).

Discussion
DDC is a promising family of estimators for analyzing the connectivity underlying large-scale brain recordings. Because DDC is derived directly from dynamical system equations that govern neural interactions, no optimization or model fitting is required. DDC is a practical and intuitive method that can be computed rapidly and scales well with the number of recording sites. Unlike methods based on covariance, which are inherently symmetrical, DDC can detect directional interactions and obtain statistical estimates of causality. DDC uncovered ground truth when applied to dynamical simulations of network models and significantly improved estimates of strong dMRI connectivity from rs-fMRI recordings compared with covariance methods. Further analysis is needed to probe the ability of DDC methods to uncover weaker connectivity. Structural anatomy is foundational in neuroscience and functional anatomy that is consistent with the underlying structural anatomy makes possible more accurate predictions about information flow in specific pathways. The improved performance of DDC on benchmarks had some limitations. First, the parcellation used for analyzing fMRI was based on ICA, which was different from dMRI analysis that used atlas-based parcellation. The additional step to link them through voxel coordinates complicates interpretation of the results. It would have been more parsimonious to use the same parcellation to process both the fMRI and dMRI data as raw voxels, but this would have required reprocessing the raw fMRI or dMRI imaging data. After briefly analyzing another atlas-based fMRI dataset, several nodes were highly colinear with others, leading to a rankdeficient covariance matrix whose inversion made the algorithm unstable. The refactoring of the fMRI data into IC components, as in the HCP dataset and commonly performed in restingstate studies, alleviated the colinearity problem. An alternative approach is simply to discard the dependent nodes as they provide no additional information when perfectly dependent. Algorithms that can invert a rank-deficient matrix could also be applied. For example, the Moore-Penrose pseudoinverse achieves the leastsquares estimation of Eq. 3. However, the inverse of a rankdeficient matrix is not unique, which could result in an infinite number of connectivity estimates. Further constraints such as connection balancing and wiring costs are needed to arrive at a unique estimate.
Another limitation, for both macroscopic brain simulations and HCP fMRI analysis, is that the "ground-truth" matrix (dMRI connectivity) is symmetric due to limitations of diffusion imaging, which precluded harnessing DDC's capacity to estimate directed connections. Tract tracing, on the other hand, could provide directed structural connections. Two connectome matrices available from macaque brains (figure 2A in ref. 36 and figure 11B in ref. 37) are also nearly symmetric. Alternatively, directionality can be disambiguated by taking into account known physiological constraints on the directed connections.
The nonlinear version of DDC is underconstrained and more accurate connectivity estimates could be made by adapting the nonlinear function to the specific process generating the neural activity and recordings. For example, a hemodynamic response kernel could be used for fMRI recordings. Here, we adopted a ReLU nonlinearity, which has been widely used to model nonlinear rate-based network models, but its performance on the spiking network model was no better than that on the linear model. A better model for a spiking network would take into account the spiking mechanism and dynamical time constants. (See M aterials and M ethods, 1.G.4 for further discussion on how to modify nonlinear DDC for spiking networks.) In principle, we could also optimize the nonlinear function in a function path space by applying variational calculus. A simpler approach would be to express R(x ) as a linear combination of nonlinear basis functions (such as B-splines) and then optimize the linear coefficients based on model evidence.
Although we focused on the application of DDC to restingstate fMRI, it could also be used to analyze fMRI or EEG time-series data collected during a cognitive task. Brain states during dynamical tasks are often nonstationary. For example, attentional mechanisms and neuromodulatory processes can result in fluctuations of the mean and variance of neural activity. Since DDC does not assume stationarity, it should be robust to these fluctuations and could thus be useful in revealing task-related changes in brain connectivity. This can be implemented by a sliding-window analysis, although temporal precision may be limited as DDC typically requires more data points (on the order of 10 3 samples) than simpler correlation methods. This limitation can be ameliorated by recording at a high sampling frequency, which is possible for EEG. We also plan to explore the bilinear approximation of dynamical systems in the DCM framework (8). This could reveal not only the endogenous connectivity but also the effects of experimental manipulations on communication between brain regions.
In conclusion, DDC has a number of favorable mathematical properties that should ensure robust estimation of connectivity from a wide range of noisy and nonstationary recordings. Access to the directionality of neural connections opens additional avenues for interpreting the causal flow of information through networks. Identifying connectivity based on dynamical systems models makes direct contact with similar approaches in many other disciplines such as bioengineering, control theory, and network science. DDC should have a broad impact on studies in these areas whenever there is need for estimating directional network connectivity from network activity. Table 1. A. Covariance-based estimators. The Cov and P matrices are

FC Estimators. All estimators and abbreviations are summarized in
where x is a column vector of the system variable and the operation , takes the outer product of two vectors and averages across time. In this paper, all time traces were z scored; thus, the covariance matrix is equivalent to correlation. The covariance matrix reveals only pairwise correlations but the partial covariance matrix controls for confounding effects, one step closer to a causal estimation of global connectivity. B. Regularized partial covariance. Under the assumption that the connectivity matrix is sparse, regularization methods could be implemented during the regression step of calculating the partial covariance matrix. For example, we calculated L1-and L2-regularized partial covariance matrices through the FMRIB (functional magnetic resonance imaging of the brain) software library (FSL) toolbox (38). To choose the regularization parameter (λ), we tested a range of them and chose the one with the best performance with a corresponding quantification metric. We tested λ = 5, 10, 20, 50, 100, 200 for L1 regularization while λ = 0.1, 0.2, 0.5, 1, 5 for L2 regularization. C. Mutual information. MI quantified the statistical dependence of two random variables beyond second-order statistics. It is a model-free estimation of dependencies and therefore it should work equally well for both linear and nonlinear simulations. We used the implementation in the Functional Connectivity Toolbox (15) to estimate partial MI since we want to estimate the global network structure. D. Granger causality. Granger causality (12) defines a statistical interpretation of causality based on predictability: A is said to "Granger cause" B if the predictability of B declines when A is removed from the predictors. The test of predictability increase or decline is usually implemented through multivariate vector autoregressive modeling. We implemented conditional Granger causality through the multivariate Granger causality (MVGC) MATLAB toolbox (39). In this approach, the fundamental assumption is that the time-series data are a realization of a stationary vector autoregressive (VAR) process. The VAR model order was chosen based on the Akaike information criterion and the coefficients of the full/reduced regression model were computed through an ordinary leastsquares solution to the regression. E. Spike-train cross-correlograms. Following Das and Fiete (40) and Guisti et al. (41), we calculated the cross-correlograms based on their Pearson correlation and averaged within a time window (τ = 0.1 s) to get a spikebased connectivity measure C spk . Specifically, for spike-train x i and x j , where the superscript bar denotes the centered version of the spike train. Then the connectivity value from j to i was evaluated as For numerical calculation, we used the sampled time interval (dt = 0.5 ms) to evaluate the integral. For neuron pairs whose firing rates are smaller than 0.1 Hz, their connectivity value was assigned as zero. Around 2,000 connection pairs were skipped in SI Appendix, Fig. S3C. F. dCov estimators. Differential covariance (Δc) was calculated as Eq. 8 where dx dt was numerically computed using a symmetric difference quotient (42). The evaluation of partial differential covariance (Δp) was derived in parallel to partial covariance. The calculation was performed elementwise as in Eq. 9 where Cov refers to the covariance matrix, and K denotes the set of all nodes except i and j: G. DDC. The definitions of ΔL, ΔR, and ΔReLU can be found in the main text. The parameter θ for ΔReLU was varied from the 5th percentile to the 95th percentile of the z-scored data. The optimal value was chosen based on either the estimation errors (three-neuron simulations) or the AUC values (LIF neuron simulations). In the brain surface model, θ was set to zero. G.1. LSE of the system equations. The DDC estimators are actually leastsquares estimation of the assumed system equations. Let us use the linear system equation as an example: To achieve the minimum of square error (L),we take the derivative of L with respect to W: Then setting ∂L ∂W = 0, we obtained the LSE of W:

[12]
In the nonlinear case, the above derivation process holds by replacing xt with R(xt). In this view, we could refer to the well-developed statistical theory to improve technical issues such as the singularity of the covariance matrix and the regularization procedure. G.2. DDC derivation for stochastic network models. To model the randomness in the recorded neural activities, we used stochastic differential equations (SDEs) and evaluated DDC in this stochastic framework: where β is a multidimensional Brownian motion with variance Q and noise structure D influencing the state variable x . See Eq. 1 for the definitions of the other terms. The time averages ( x := T t=0 xt) are different from ensemble averages [E(x)] under nonstationary conditions, as analyzed in the next section.
Operating on both sides of this equation with , x ,

[14]
To evaluate dβ dt , x , we first write down the explicit solution of the linear SDE starting at t = 0 and then time average both sides: The first term W T is the summation of time-dependent linear Brownian increments, and thus the mean is zero and the variance is a time-dependent scaling of the Brownian variance: [16] The second term B T was evaluated using the Ito integral. Because Brownian motion is nowhere differentiable on its path, we numerically approximated the time derivative that we used in the simulations. If we assume {t k } ∞ k=1 as a partition of [0, t] whose partition size is infinitesimal as n → ∞ ,we can compute the Ito integral in the limit. For simplicity, define Φτ = exp(W(t − τ ))D. Then the two terms are composed of nonoverlapping Brownian increments from which it follows that E(Bt) = 0 because Brownian motion has independent and stationary increments: [18] Taken together, the first-order statistics of our linear DDC estimator become [

19]
This derivation confirms that DDC is unbiased in the presence of noise as a linear combination of Brownian motion. Simulations of the linear three-neuron model revealed that dβ dt , x is at least 10 times smaller in magnitude than x, x even for very high noise variance Q.Remarkably,DDC can still recover the groundtruth connectivity for correlated noise structure (D in Eq. 13). G.3. Nonstationary conditions. A continuous-time stochastic solution of the SDE, {xt} T t=0 , is stationary when its finite-dimensional joint distribution is time invariant, which implies that its mean and covariance remain constant across time. Only under the stationary assumption can the covariance matrix then be estimated by the time-averaged sample covariance.
The above SDE framework allows the mean and covariance of state variables to vary with time according to the Ito formula Under nonstationary conditions, x, x is no longer a valid estimate of the covariance matrix. Because the system equation holds at every time step regardless of stationarity, our DDC estimators remain valid and unbiased. These properties make DDC a robust and efficient estimator of FC. G.4. Modifications of the nonlinear estimator. Models of spiking neurons have several features that are not included in the models studied in this paper. First, decay time constants for neurons and synapses are an important part of their dynamics, such as the leaky dynamics of LIF models. Consider the following linear and nonlinear system equations that include membrane time constants for temporal filtering: For the linear case, DDC estimation differs only up to a scaling factor and the diagonal terms: For the nonlinear system, the time decay introduces a new term: where ΔD is a different DDC estimator of W for the temporally filtered nonlinear system. It is proportional to a weighted average of dCov and Cov. The time constant in Eq. 25 is a free parameter that can be estimated or optimized. This approach allows for a better match with the circuit mechanisms that generate the activity and how that activity is transformed by the recording techniques.
A second modification to DDC concerns the sparsity of the spiking. For the simulated LIF network,the sparse spike trains were filtered with decaying synaptic currents. As a consequence, R(x), x was rank deficient and thus noninvertible. One potential remedy is to find the least-squares solution of a sequence of linear equations to optimize W. This will be explored in a future study.

Simulations of Neural Systems.
A. Neural motif dynamics. We tested the performance of these methods in networks structured to have typical false positive motifs-chain (Fig. 3 A and B) and confounder (Fig. 3C)-with different dynamics and in another Rössler chaotic system. To stabilize the simulation, all nodes had decaying dynamics and they were linked by inhibitory connections. Specifically, the diagonal entries in the ground-truth matrix were set to −1 and connection strength was set to −0.5. We tested connection strength from −0.1 to −1 and it did not affect the estimation results qualitatively.
For linear dynamics, system variable x was simulated through Euler integration according to Eq. 26, where u is the Gaussian-distributed random drive and is the Gaussian-distributed observational noise, both independent from x. The integration step is 0.01 s and the length of simulation is 1,000 s unless otherwise specified: x obs = x + , ∼ N (0, σ 2 obs ).

[26]
For nonlinear dynamics, simulation was governed by Eq. 27, where we used a centered sigmoid function to simulate the nonlinearity. The sigmoid function was shifted to have mean of zero because otherwise the inhibitory signal would be too strong in the network and the signal would decay to zero in a short time interval. In the expression of R(x), slope α controls the level of nonlinearity in the network and was set to 1 by default. Note the mismatch between simulation nonlinearity and the estimation nonlinearity. The integration step is 0.1 ms and signals were down-sampled to 100 Hz after estimation. Simulation length is 1,000 s unless otherwise mentioned: .

[27]
The equations for the Rössler system are where x = [x 1 , x 2 , x 3 ] T , a = b = 0.2, and c = 5.7. This set of parameters was originally used by Rössler to study the behavior of its chaotic dynamics. The signal was integrated at the step of 0.01 s for 1,000 s. The first 100 s of transient dynamics were discarded.
To simulate a nonstationary system with time-invariant connectivity pattern, we borrowed the idea of hidden Markov models and simulated a two-state dynamical system. The system was simulated for 1,000 s using Euler integration governed by Eq. 29. Up until 500 s, the noise structure remained to be the identity matrix. After 500 s, we switched the noise structure to D 2 . The steady-state covariance structure of these two states could be analytically evaluated using Eq. 21 and is plotted in Fig. 2A: [29] B. Sparse LIF network. The connectivity matrix was constructed as an Erdös-Rényi random graph: Two nodes being connected has probability equal to network sparsity. All connected edges were assigned to have the same strength. Thus, the connectivity matrices were parameterized by only sparsity level and connection strength. LIF neurons could be described by Eq. 30 with doubleexponential filtered synapses (29).Once membrane voltage V reaches a threshold V thres , the neuron will emit a spike and reset the membrane potential to Vreset. The spike train was described by t k <t δ(t − t k ) and then filtered to generate synaptic current r i . We used subthreshold membrane potential as the system variable (x) of interest. We simulated networks with 200 neurons. The integration process was performed at the step of 0.05 ms, down-sampled to 2,000 Hz, and simulated for 20 s unless otherwise mentioned: [30] C. Anatomically supported brain surface model. Regional brain activities were simulated using a reduced Wong-Wang model (30) (Eq. 31), where state variable S i denotes the dynamics of the synaptic gating variable and H(x i ) is the population firing rate. This model is a dynamic mean-field approximation of numerous local spiking units and was applied to efficiently model restingstate dynamics. It involves both self-excitation and long-range experimentally measured connections. The state variable was simulated at 1,000 Hz and the simulation length was 100 s: [31] (40), we decomposed the estimation error into variance and bias (SI Appendix, Fig. S1). In most cases, the estimation is different from the ground-truth matrix by a scale. So we normalized both estimated and ground-truth matrices between −1 and 1. In addition, dCovbased estimators are directed estimators while covariance-based ones are not. For fair comparison, we considered only the estimation of the lower triangle part where all ground-truth connections are located. After scaling and lower triangle restriction, estimation error, variance, and bias were calculated as Eq. 32, where W,Ŵ, andW are ground-truth matrix, estimated matrix, and the average of estimated matrices across trials and ||.|| is the vector L2 norm. It is easy to verify that Error 2 = Bias 2 + Variance 2 and thus the vector forms of bias and variance are orthogonal to each other. We measure the relative contribution of bias by the angle (θ b , Eq. 33) between the vectors associated with bias and variance.Fifty repetitive trials were used across all simulations:

A. Variance and bias. Following Das and Fiete
). [33] B. Sensitivity and specificity. To evaluate the estimation performance in LIF networks, connection recovery sensitivity and specificity were calculated since the networks have sparse connection and the connection strengths are uniform. To be more specific, the estimated matrices were binarized based on their absolute values to determine the existence of connections, which were then compared with the ground-truth connections. We used the absolute value because we cared only about the presence of a connection. Sensitivity and specificity were calculated as the true positive rate and one minus the false positive rate. Varying the binarization threshold gave rise to the receiver operator curve (ROC). The area under the ROC, calculated by trapezoidal integration, indicates the method's general performance in classifying connections. For performance evaluation in the brain surface model, c-sensitivity (6) (equation 17 in ref. 43) was adopted. It is defined as the fraction of the estimated true positive values that are higher than the 95th percentile of the false positive values. Like ROC, c-sensitivity quantitatively estimated how sensitive methods are to estimating the presence of a connection. Thus, the absolute values of the estimated matrices were used here.

A. Extracting time traces from rs-fMRI recordings.
We used the extensively processed "HCP1200 Parcellation + Timeseries + Netmats (1003 Subjects)" dataset available through the website (https://www.humanconnectome.org). Detailed preprocessing and study design could be easily accessed through the website. In this release, 1,003 healthy adult human subjects (ages 22 to 37 y, 534 females) were scanned on a 3-T Siemens connectome-Skyra scanner (customized to achieve 100 mTm −1 gradient strength). Each subject underwent 4 × 15 min recording sessions with temporal resolution of 0.73 s and spatial resolution of 2 mm isotropic.
For imaging data processing, each 15-min run of each subject's rs-fMRI data was preprocessed according to Smith et al. (44); it was minimally preprocessed (45) and had artifacts removed using ICA and FMRIB's ICA-based X-noiseifier (FIX) (46,47). Intersubject registration of cerebral cortex was carried out using areal-feature-based alignment and the multimodal surface matching algorithm ("MSMAll") (48,49). Each dataset was temporally demeaned and had variance normalization and then was fed into the MIGP algorithm, whose output is the top 4,500 weighted spatial eigenvectors from a group-averaged principal component analysis (PCA) (a very close approximation to concatenating all subjects' time series and then applying PCA) (50). The MIGP output was fed into group ICA using FSL's MELODIC tool, applying it at several different dimensionalities (D = 25, 50, 100, 200, 300). In our analysis, we used the 100-dimension decomposition.
For a given parcellation (group-ICA map), the ICA spatial maps were used to derive one representative time series per IC per subject. This process was fulfilled by the standard "dual-regression stage-1" approach, in which the full set of ICA maps was used as spatial regressors against the full data (51). This results in an N × T (number of components × number of time points) matrix for each subject. Thus, we consider each IC as a network node. B. Intra-and interindividual variability. The intraindividual variability was quantified as the correlation between DDC estimations across scans. Since DDC requires data points from at least two scans to achieve a relatively high consistency, we concatenated data from two scans for DDC estimation and compared them with the estimation results from the other two remaining scans (Fig. 5C). For interindividual variability, estimation results of all six different combinations of concatenation were compared. C. Significance test of the estimated connections. To assess the statistical significance of the estimated connection, we used an autoregressive (AR) bootstrap procedure (52,53) to preserve the power spectrum density of BOLD signals. For a specific estimated connection, denoted as element (i, j), our null hypothesis was that signals x i and x j are independent regardless of other nodes' influence. To generate null time series, we fitted separate AR processes of model order q to node-specific time traces. The model order q was determined according to the Bayesian information criterion (BIC). A higher-order model was rejected if it could not decrease BIC by more than 2. Using the estimated AR coefficients of empirical time series, we generated 1,000 surrogate null time series and then computed the associated FC corresponding to the null hypothesis. For each connection, we assumed a Gaussian distribution of the null connectivity values generated from null time traces. The P value was calculated as the probability of the empirical value that appeared under the null Gaussian distribution. In this paper, we adopted a sequence of significance levels to binarize the matrix so that we could investigate the network behavior asymptotically. D. Individual-level dMRI strength. To compare the FC metrics to the underlying corticocortical white matter connectivity, we reorganized our previously published diffusion-MRI-based structural connectome (31) in which connectivity was assessed among the 360 cortical areas of the HCP multi-modal parcellation version 1.0 (MMP1.0) atlas (49). Of the 100 IC nodes, 46 are composed of at least 40% cortical voxels (SI Appendix, Fig. S8A) and as the dMRI connectome was restricted to corticocortical relationships, we limited the scope of our analyses to these nodes. Because the IC nodes have a greater spatial extent than the atlas areas, each is composed of several areas, in whole or in part (mean = 28.3 areas).
For each IC node pair, dMRI connectivity was assessed by obtaining the average of the nodes' constituent interareal connectivity weighted by the fraction of the node pair's voxels assigned to each areal pair. In cases where an atlas area was partially present in both IC nodes of a pair, that area was excluded from the mean as short-range intra-areal anatomical connectivity was not available. Data Availability. See SI Appendix, Table S1 and Figs. S1-S9 for supporting information. Implementation of DDC, network simulation, and HCP processing scripts are all available through GitHub (https://github.com/yschen13/DDC). Previously published data were used for this work (Human Connectome Project).